-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Categorical(Index) passed as categories #17888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Categorical(Index) passed as categories #17888
Conversation
pandas/core/dtypes/dtypes.py
Outdated
@@ -316,7 +315,10 @@ def _validate_categories(categories, fastpath=False): | |||
from pandas import Index | |||
|
|||
if not isinstance(categories, ABCIndexClass): | |||
categories = Index(categories) | |||
categories = Index(categories, tupleize_cols=False) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be an elif and instead should use is_categorical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the point of the ABC ones to avoid imports from itself ? (as is_categorical
uses CategoricalDtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Can you answer this one? No problem to change it if you think it is better, but it seems to me using isinstance(categories, ABCCategoricalIndex)
is consistent with the use of isinstance(categories, ABCIndexClass)
just above (also in readability of the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, yeah I guess you have to do this. but you can also check isinstance(categories, (ABCCateoricalIndex, ABCCategorical))
is more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use isinstance(categories, (ABCCateoricalIndex, ABCCategorical))
first and then only to check for Index
Thanks Joris, agreed with the changes here. Looks like this conflicts with your fastpath PR. |
Codecov Report
@@ Coverage Diff @@
## master #17888 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50105 50103 -2
==========================================
- Hits 45714 45704 -10
- Misses 4391 4399 +8
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17888 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50105 50105
==========================================
- Hits 45714 45706 -8
- Misses 4391 4399 +8
Continue to review full report at Codecov.
|
pandas/tests/test_categorical.py
Outdated
|
||
result = pd.Categorical( | ||
['a', 'b'], categories=pd.CategoricalIndex(['a', 'b', 'c'])) | ||
tm.assert_categorical_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Categorical
and CategoricalIndex
have been imported, so you can remove the pd.
here, if you feel that's cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks! (updated)
pandas/tests/test_categorical.py
Outdated
|
||
result = pd.Categorical.from_codes( | ||
[0, 1], categories=pd.CategoricalIndex(['a', 'b', 'c'])) | ||
tm.assert_categorical_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same regarding pd.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -1023,6 +1023,7 @@ Categorical | |||
- Bug in the categorical constructor with empty values and categories causing the ``.categories`` to be an empty ``Float64Index`` rather than an empty ``Index`` with object dtype (:issue:`17248`) | |||
- Bug in categorical operations with :ref:`Series.cat <categorical.cat>` not preserving the original Series' name (:issue:`17509`) | |||
- Bug in :func:`DataFrame.merge` failing for categorical columns with boolean/int data types (:issue:`17187`) | |||
- Bug in constructing a ``Categorical``/``CategoricalDtype`` when the specified ``categories`` where of categorical type (:issue:`17884`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to the CDT issue, IOW the sub-section (as its actually not an independent bug) and better there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug also exists for Categorical
and CateforicalIndex
and was already present in current released 0.20.3. So not directly related to the CDT rework (although the bug is present in CDT as well). But not that important, can also move if you want.
One disadvantage of the my changes is that is somehow relaxes the "unique" requirement for the passed categories. Because now you can pass a non-unique Categorical, and we don't raise anymore but just take its unique categories. |
@jorisvandenbossche ping when updated. |
As far as I am concerned, this is updated. Is there a comment I overlooked? |
@@ -326,6 +326,9 @@ def _validate_categories(categories, fastpath=False): | |||
if not categories.is_unique: | |||
raise ValueError('Categorical categories must be unique') | |||
|
|||
if isinstance(categories, ABCCategoricalIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you an also accept ABCCategorical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a Categorical has already been converted to CategoricalIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh ok
git diff upstream/master -u -- "*.py" | flake8 --diff
@TomAugspurger I am not sure we had a discussion about that before, but to me it does not make many sense to have a CategoricalIndex as the categories of a Categorical/CategoricalIndex, so I think we should convert passed categories (which is what this PR is doing).
Alternative to fix #17884 is to fix the repr code to deal with Categorical categories.